Embedding-Based Speaker Adaptive Training of Deep Neural Networks
نویسندگان
چکیده
An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent elementwise affine transformations to canonicalize the internal feature representations at the output of hidden layers of a main network. The control network for generating the speaker-dependent mappings are jointly estimated with the main network for the overall speaker adaptive acoustic modeling. Experiments on large vocabulary continuous speech recognition (LVCSR) tasks show that the proposed SAT scheme can yield superior performance over the widely-used speaker-aware training using i-vectors with speaker-adapted input features.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملCystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کاملAn adaptive estimation method to predict thermal comfort indices man using car classification neural deep belief
Human thermal comfort and discomfort of many experimental and theoretical indices are calculated using the input data the indicator of climatic elements are such as wind speed, temperature, humidity, solar radiation, etc. The daily data of temperature، wind speed، relative humidity، and cloudiness between the years 1382-1392 were used. In the First step، Tmrt parameter was calculated in the Ray...
متن کاملSpeaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold Using Deep Neural Networks with an Evaluation on Speaker Segmentation
This paper presents a novel approach, we term Speaker2Vec, to derive a speaker-characteristics manifold learned in an unsupervised manner. The proposed representation can be employed in different applications such as diarization, speaker identification or, as in our evaluation test case, speaker segmentation. Speaker2Vec exploits large amounts of unlabeled training data and the assumption of sh...
متن کاملLink Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کامل